Improving Mention Detection Robustness to Noisy Input

نویسندگان

  • Radu Florian
  • John F. Pitrelli
  • Salim Roukos
  • Imed Zitouni
چکیده

Information-extraction (IE) research typically focuses on clean-text inputs. However, an IE engine serving real applications yields many false alarms due to less-well-formed input. For example, IE in a multilingual broadcast processing system has to deal with inaccurate automatic transcription and translation. The resulting presence of non-target-language text in this case, and non-language material interspersed in data from other applications, raise the research problem of making IE robust to such noisy input text. We address one such IE task: entity-mention detection. We describe augmenting a statistical mention-detection system in order to reduce false alarms from spurious passages. The diverse nature of input noise leads us to pursue a multi-faceted approach to robustness. For our English-language system, at various miss rates we eliminate 97% of false alarms on inputs from other Latin-alphabet languages. In another experiment, representing scenarios in which genre-specific training is infeasible, we process real financial-transactions text containing mixed languages and data-set codes. On these data, because we do not train on data like it, we achieve a smaller but significant improvement. These gains come with virtually no loss in accuracy on clean English text.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Detection of Unknown Input in Positional Control Problems with Noisy Measurements

The paper examines the conditions for isolating the unknown input detection from the effects of the measurements noise in the important family of positional control problems. The study is motivated by the need of improving the homing performance of interceptor missiles against randomly maneuvering targets. The required isolation is possible if, in addition to noisy relative position measurement...

متن کامل

Detecting Effectiveness of Outliers and Noisy Data on Fuzzy System Using FCM

Fuzzy systems which are an artificial intelligent technique are applicable for controlling and decision support systems. Fuzzy systems are created using membership functions (MFs) which modeled based on dataset. Therefore, there is relation between uncertainty of input data and fuzziness expressed by MFs. Outliers and noisy data are kinds of uncertainty which affect on membership function. Thus...

متن کامل

Towards improving speech detection robustness for speech recognition in adverse conditions

Recognition performance decreases when recognition systems are used over the telephone network, especially wireless network and noisy environments. It appears that non-efficient speech/non-speech detection (SND) is an important source of this degradation. Therefore, speech detection robustness to noise is a challenging problem to be examined, in order to improve recognition performance for the ...

متن کامل

Noisy images edge detection: Ant colony optimization algorithm

The edges of an image define the image boundary. When the image is noisy, it does not become easy to identify the edges. Therefore, a method requests to be developed that can identify edges clearly in a noisy image. Many methods have been proposed earlier using filters, transforms and wavelets with Ant colony optimization (ACO) that detect edges. We here used ACO for edge detection of noisy ima...

متن کامل

A Noisy Channel Approach to Error Correction in Spoken Referring Expressions

We offer a noisy channel approach for recognizing and correcting erroneous words in referring expressions. Our mechanism handles three types of errors: it removes noisy input, inserts missing prepositions, and replaces mis-heard words (at present, they are replaced by generic words). Our mechanism was evaluated on a corpus of 295 spoken referring expressions, improving interpretation performance.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010